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MATRIX NORMS AND RAPID MIXING FOR SPIN SYSTEMS 

By Martin Dyer, Leslie Ann Goldberg and Mark Jerrum 

University of Leeds, University of Liverpool and University of London 

We give a systematic development of the application of matrix 
norms to rapid mixing in spin systems. We show that rapid mix- 
ing of both random update Glauber dynamics and systematic scan 
Glauber dynamics occurs if any matrix norm of the associated depen- 
dency matrix is less than 1. We give improved analysis for the case 
in which the diagonal of the dependency matrix is (as in heat bath 
dynamics). We apply the matrix norm methods to random update 
and systematic scan Glauber dynamics for coloring various classes of 
graphs. We give a general method for estimating a norm of a symmet- 
ric nonregular matrix. This leads to improved mixing times for any 
class of graphs which is hereditary and sufficiently sparse including 
several classes of degree-bounded graphs such as nonregular graphs, 
trees, planar graphs and graphs with given tree-width and genus. 

1. Introduction. A spin system consists of a finite set of sites and a 
finite set of spins. A configuration is an assignment of a spin to each site. 
Sites interact locally, and these interactions specify the relative likelihood of 
possible (local) subconfigurations. Taken together, these give a well-defined 
probability distribution vr on the set of configurations. 

Glauber dynamics is a Markov chain whose states are configurations. In 
the transitions of the Markov chain, the spins are updated one at a time. 
The Markov chain converges to the stationary distribution vr. During each 
transition of random update Glauber dynamics, a site is chosen uniformly at 
random and a new spin is chosen from an appropriate probability distribu- 
tion (based on the local subconfiguration around the chosen site). During a 
transition of systematic scan Glauber dynamics, the sites are updated in a 
(deterministic) systematic order, one after another. Again, the updates are 
from an appropriate probability distribution based on the local subconfigu- 
ration. 
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It is well known that the mixing times of random update Glauber dy- 
namics and systematic scan Glauber dynamics can be bounded in terms of 
the influences of sites on each other. A dependency matrix for a spin system 
with n sites is an n x n matrix R in which Rij is an upper bound on the 
influence (defined below) of site i on site j. 

An easy application of the path coupling method of Bubley and Dyer 
shows that if the Loo norm of R (which is its maximum row sum and is 
written ||i?||oo) is less than 1 then random update Glauber dynamics is 
rapidly mixing. The same is true if the Li norm (the maximum column 
sum of R, written ||i?||i) is less than 1. The latter condition is known as 
the Dobrushin condition. Dobrushin [11] showed that if ||ii||i < 1, then the 
corresponding countable spin system has a unique Gibbs measure. As we now 
know (see Weitz [39]), there is a very close connection between rapid mixing 
of Glauber dynamics for finite spin systems and uniqueness of Gibbs measure 
for the corresponding countable systems. Dobrushin and Shlosman [12] were 
the first to establish uniqueness when ||i?||oo < 1- Their analysis extends to 
block dynamics but we will stick to Glauber dynamics in this paper. For an 
extension of some of our ideas to block dynamics, see [30]. 

The Dobrushin condition ||-R||i < 1 implies that systematic scan is rapidly 
mixing. A proof follows easily from the account of Dobrushin uniqueness in 
Simon's book [35], some of which is derived from the account of Follmer [19]. 
In [14], we showed that ||i?||oo < 1 also implies rapid mixing of systematic 
scan Glauber dynamics. [14], Section 3.5, notes that it is possible to prove 
rapid mixing by observing a contraction in other norms besides the Li norm 
and the Loo norm. This idea was developed by Hayes [22], who showed 
that rapid mixing occurs when the spectral norm ||-R||2 is less than one. For 
symmetric matrices, the spectral norm is equal to the largest eigenvalue of R, 
X{R). So, for symmetric matrices, [22] gives rapid mixing when X{R) < 1. In 
general, ||i?2||/A(i?) can be arbitrarily large, see Section 2.1. 

In this paper, we give a systematic development of the application of 
matrix norms to rapid mixing. We first show that rapid mixing of random 
update Glauber dynamics occurs if any matrix norm is less than 1. Formally, 
we prove the following, where J„ is the norm of the all I's matrix. All 
definitions are given in Section 2. 

Lemma 1. Let R be a dependency matrix for a spin system, and let \\ ■ \\ 
be any matrix norm such that \\R\\ < fi <1. Then the mixing time of random 
update Glauber dynamics is bounded by 

f^(e) ^ n{l - ^i)-i ln((l - fi)-'Jn/e). 
We prove a similar result for systematic scan Glauber dynamics. 
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Lemma 2. Let R be a dependency matrix for a spin system and \\ ■ \\ any 
matrix norm such that \\R\\ < fj, < 1. Then the mixing time of systematic 
scan Glauber dynamics is bounded by 

fs(e)~(l-/i)-iln((l J„/e). 

The chief benefit of the new lemmas is that they can be used to show 
rapid mixing whenever the dependency matrix has any norm which is less 
than 1, even if the norms which are mentioned in previous theorems are 
not less than 1. Section 2.3 gives an example of a spin system for which 
Lemmas 1 and 2 can be used to prove rapid mixing, while previous theo- 
rems are inapplicable. The point of the lemmas is that rapid mixing occurs 
whenever any matrix norm is bounded — specific properties of the norm are 
not relevant. 

Section 3.1 uses path coupling to prove Lemmas 1 and 2. Despite his- 
torical differences, the path-coupling approach is essentially equivalent to 
Dobrushin uniqueness. To demonstrate the relationship between the ap- 
proaches, we again prove the same lemmas using Dobrushin uniqueness in 
Section 3.2. We also give an improved analysis for the case in which the 
diagonal of is 0, which is the case for heat bath dynamics. We prove the 
following. 

Lemma 3. Let R be symmetric with zero diagonal and \\R\\2 = = 
A < 1 . Then the mixing time of systematic scan is at most 

fs{e) ~ (1 - iA)(l - A)-^ ln((l - Xy^/e). 

An interesting observation is that when X{R) is close to 1, the number 
of Glauber steps given in the upper bound from Lemma 3 is close to half 
the number that we get in our best estimate for random update Glauber 
dynamics (see Remark 6) — perhaps this can be interpreted as weak evidence 
in support of the conjecture that systematic scan mixes faster than random 
update for Glauber dynamics. 

1.1. Applications. The study of spin systems originates in statistical 
physics. Configurations in spin systems are used to model configurations in 
physical systems involving interacting particles. Rapid mixing is important 
for two reasons. 

(i) When Glauber dynamics is rapidly mixing, it can be used for sam- 
pling. Typically, we are interested in sampling configurations to learn about 
the equilibrium distribution. In particular, we are often interested in esti- 
mating the so-called partition function of the system. If Glauber dynamics 
is rapidly mixing, then a short simulation (of feasible length) yields a sam- 
ple distribution which is close to the equilibrium distribution. Otherwise, 
Glauber dynamics is an inappropriate means of producing samples. 
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(ii) Rapid mixing of Glauber dynamics is strongly associated with qual- 
itative properties such as uniqueness of the infinite-volume Gibbs measure. 
Infinite systems are beyond the scope of this paper, but it is interesting to 
note that there are rigorous proofs that rapid mixing of Glauber dynam- 
ics on finite systems often coincides with uniqueness, which is a qualitative 
property on infinite systems — the property of having one, rather than many, 
qualitative equilibria. See [27, 39] for more details about this fascinating con- 
nection. 

In computer science, rapid mixing has important applications to the com- 
putational complexity of counting problems and their relatives. While exact 
counting seems intractable in most cases, efficient sampling usually implies 
the possibility of efficient approximate counting [25]. In this area, consider- 
able attention has been paid to problems which are essentially spin systems, 
for example, colorings and independent sets in graphs [31]. Here the spe- 
cific dynamics is not important, only that it has polynomial mixing time. 
However, it is generally the case that if any dynamics mixes rapidly then 
so will the Glauber dynamics. This can usually be established using Markov 
chain comparison techniques [9, 17, 32]. Therefore, the Glauber dynamics 
still retains a central importance. 

Traditionally, rigorous analysis has focused on the mixing properties of 
random update Glauber dynamics, which is easier to analyze than system- 
atic scan Glauber dynamics. (See [1, 8, 16] for a discussion of some notable 
exceptions.) However, experimental work is often carried out using system- 
atic scan strategies. Thus, it is important to understand the mixing time 
of systematic scan Glauber dynamics. The observation that the Dobrushin 
condition implies that systematic scan is rapidly mixing (which is an obser- 
vation of Sokal) was an important breakthrough. This was extended in [14] 
which showed that the Dobrushin-Shlosman condition (bounding the L^o 
norm) also implies rapid mixing of systematic scan Glauber dynamics. Dyer, 
Goldberg and Jerrum [14] gave an application to sampling proper colorings 
of an arbitrary degree bounded graph. This is an important application in 
computer science because colorings are used to model many combinatorial 
structures such gnments and timetables. 

Hayes [22] gives applications of conditions of the Dobrushin type to vari- 
ous related problems on graphs, using the norm || • ||2. In [14], we observed 
that the dependency matrix for the Glauber dynamics on graph colorings 
can be bounded by a multiple of the adjacency matrix of the graph. This 
was applied to analyzing the systematic scan dynamics for coloring near- 
regular graphs, and hence to regular graphs. Hayes extends the observation 
of [14] to the Glauber dynamics for the Ising and hard core models. He 
applies these observations with a new estimate of the largest eigenvalue of 
the adjacency matrix of a planar graph, obtaining an improved estimate 



MATRIX NORMS AND RAPID MIXING FOR SPIN SYSTEMS 



5 



of the mixing time of these chains on planar graphs with given maximum 
degree. He also applies them to bounded-degree trees, using an eigenvalue 
estimate due to Stevanovic [36], for which he provides a different proof. He 
extends these results to the systematic scan chain for each problem, using 
ideas taken from [14]. 

In Section 4, we apply the matrix norm methods developed here to the 
random update Glauber dynamics and systematic scan dynamics for coloring 
various classes of graphs. We give a general method for estimating the norm 
II • II2 of a symmetric nonnegative matrix R. Our method is again based on 
matrix norms. We show that there exists a "decomposition" R = B + B^^ , 
for some matrix B, where ||-B||i, ||-B||oo can be bounded in terms of ||-R||i 
and the maximum density of R. The bounds on ||-B||i, ||-B||oo can then be 
combined to bound ||-R||2- In particular, our methods allow us to give a 
common generalization of results of Hayes [22], Stevanovic [36] and others 
for the maximum eigenvalue of certain graphs. In most cases, we are also 
able to strengthen the previous results. In particular. Corollaries 49(i) and 
49 (ii) improve the results of Stevanovic and Zhang and Corollary 49 (iv) 
improves a result of Hayes. Theorem 51 gives new rapid-mixing results for 
sparse hereditary graph classes. 

Using this, we obtain whole classes of graphs for which we did not have 
rapid mixing results which improved those on arbitrary degree-bounded 
graphs, but now we do. These results are summarized in Corollary 52. 
Part (i) gives mixing time bounds for all connected graphs when q, the 
number of spins, is equal to twice the degree, A. The q = 2A boundary 
case is important and well studied. Part (i) improves the mixing time bound 
given in Theorem 5 of [14] by a factor of n. Part (ii) gives mixing time 
bounds for graphs with bounded tree-width. These extend results by Mar- 
tinelli, Sinclair and Weitz [28] which show rapid mixing for trees, but not for 
graphs with higher treewidth (trees are graphs with treewidth 1). Part (iii) 
gives mixing-time bounds for planar graphs. These improve the results of 
Hayes [22] which do not apply unless q is increased by a fixed multiple of ^. 
The goal is to get rapid-mixing results for q as small as possible. For trees, it 
is known that q = A + 3 suffices, and it is an open question how small q can 
be as a function of A for these other graph classes. Part (iv) improves our 
planar graphs results by extending them to general graphs with bounded 
genus, rather than just to planar graphs. Prior to our work, rapid mixing 
was known only for q> 11 A/6 [38]. 

2. Preliminaries. Let [n] = {1, 2, . . . , n}, N = {1, 2, 3, . . .}, and No = N U 
{0}. We use Z,M for the integers and reals, and M-(_ for the nonnegative 
reals. Let |c| denote the absolute value of c. 
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2.1. Matrix norms. Let Mm„ = M™^" be the set of real m x n matrices. 
We denote M„„, the set of square matrices, by M„. The set of nonnegative 
matrices will be denoted by M^„, and the set of square nonnegative matrices 
by . We will write for the zero mxn matrix and / for the nxn identity 
matrix. The dimensions of these matrices can usually be inferred from the 
context, but where ambiguity is possible or emphasis required, we will write 
n, I-ni stc. Whether vectors are row or column will be determined either 
by context or explicit statement. The ith component of a vector v will be 
written both as Vi and v{i), whichever is more convenient. If i? is a matrix 
and V a vector, Rv{i) will mean {Rv)i. We will use J for the nxn matrix 
of I's, 1 for the column n-vector of I's, and l"^ for the row n-vector of I's. 
Again, the dimensions can be inferred from context. 

A matrix norm (see [23]) is a function || • || rM^n for each m,?7, G N 

such that: 

(i) ||i?|| = and R G Mmn if and only if = G Mm„; 

(ii) ll^iill = 1^1 ||i?|| for all /i G M and R G M„„; 

(iii) \\R + 5|| < ||i?|| + \\S\\ for all R,Se M™„; 

(iv) \\RS\\ < \\R\\\\S\\ for all i?GM„fc, S G Mfe„ (A:GN). 

Note that property (2.1) {suhmultiplicativity) is sometimes not required for 
a matrix norm, but we will require it here. The condition that || • || be defined 
for all m,n is, in fact, a mild requirement. Suppose || • || is initially defined 
only on M„, for any large enough n, then we can define ||-R|| for R G M^fc 
{m,k£ [n]) by "embedding" R in IVI-^, tlictt is, 



R 


Om,n— fc 


m,fe 


On— m,n— fc 



It is straightforward to check that this definition gives the required proper- 
ties. For many matrix norms, this embedding norm coincides with the actual 
norm for all m,k £ [n]. 

Examples of matrix norms are operator norms, defined by \\R\\ = max^-^o ll-R^^H/ 
ll^ll for any vector norm \\x\\ defined on M"" for all n G N. Observe that we 
denote a matrix norm by || • || and a vector norm by || • ||. Since vector norms 
occur only in this section, this should not cause confusion. In fact, their 
meanings will also be very close, as we discuss below. 

For any operator norm, we clearly have ||/|| = 1. The norms || • ||i, || • II2 
and II • lloo, are important examples, derived from the corresponding vector 
norms. The norm ||i?||i is the maximum column sum of i?, ||i?||oo is the 
maximum row sum, and the spectral norm ||-R||2 = V^, where A is the largest 
eigenvalue of R^R. (See [23], pages 294-295, but observe that |||-||| is used 
for what we denote here by || • ||.) The Frobenius norm ||ii||_F = ^fj 
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(see [23], page 291) is an example of a matrix norm which is not an operator 
norm. Note that ||I|| = \pri for the Frobenius norm, so it cannot be defined 
as an operator norm. 

New matrix norms can also be created easily from existing ones. If Wn G 
M„ is a fixed nonsingular matrix for each n, then || • ||vk = ll^m(')^7r^ll is 
a matrix norm. (See [23] page 296.) Note that || • is an operator norm 
whenever || • || is, since it is induced by the vector norm • ||- 

The following relate matrix norms to absolute values and corresponding 
vector norms. 

Lemma 4. Suppose c G M. Let \\ ■ \\ be a matrix norm on 1 x 1 matrices. 
Then \c\ < \\c\\. 

Proof. This follows from the axioms for a matrix norm. First, ||c|| = 
||c X 1|| = |c| ||1|| by (ii). Also, ||c|| = ||c x 1|| < ||c|| ||1|| by (iv). Finally, ||1|| / 
by(i). □ 

Lemma 5. Suppose x is a column vector, \\ ■ \\ a vector norm and \\ ■ \\ 
the corresponding operator norm. Then \\x\\ = ||l||||x||. 

Proof. Let x be a length-^ column vector. ||2;|| is the vector norm ap- 
plied to X, ||1|| is the same norm applied to the length- 1 column vector 
containing a single 1. ||x|| is the operator norm applied to the i x 1 ma- 
trix containing the single column x. Then ||x|| = max^^o II^ck||/I|q^II where 
a is a nonnegative real number. Pulling constants out of the vector norm, 
maX(j^o = ||x||/||l||. □ 

The dual (or adjoint [23], page 309) norm || • ||* of a matrix norm || • || will 
be defined by ||ii||* = Thus, || • ||i and || • ||oo are dual, and || • [[2 is 

self-dual. Note that, for any column vector x, \\x'^\\ = \\x\\* so, for example, 
II 2; 111 = ||x||oo- Clearly, any matrix norm || • || induces a vector norm || • || 
on column vectors. Then the dual matrix norm, as defined here, is closely 
related to the dual vector norm, which is defined by 

II II* \x y\ 

X =max- 



||y|| 

Lemma 6. Suppose x is a column vector, \\ ■ \\ a vector norm, and \\ ■ \\ 
the corresponding operator norm. Then \\x\\* = ||1||* ||x||*. 

Proof. By definition, ||1||* = max^^o |cK|/||ck|| = l/l|l||5 after pulling out 
constants, and 

111 * Hill \x^y\ \x^y\\m \\x^y\\ u t * rn 

||l||||x|| =||l||max-r — r- = max — - — - — =max— r — r— = ||x || = ||x|| . U 

y^o \\y\\ \\y\\ \\y\\ 
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With any matrix R = (Rij) G M„ we can associate a weighted digraph 
G{R) with vertex set [n], edge set E = £ [n]^ : Rtj 7^ 0}, and G E 

has weight Rij. The (zero-one) adjacency matrix of G{R) will be denoted 
by A(R). If G{R) is labeled so that each component has consecutive num- 
bers, then R is block diagonal and the (principal) blocks correspond to the 
components of G{R). A block is irreducible if the corresponding component 
of G{R) is strongly connected. Note, in particular, that R is irreducible if 
R> 0. If i? is symmetric, G{R) is an undirected graph and R is irreducible 
when G{R) is connected. For i,j G V, d{i,j) will denote the number of edges 
in a shortest directed path from i to j. If there is no such path, d{i,j) = 00. 
The diameter of G, D{G) = maxjjgy d(i, j). Thus, G is strongly connected 
when D{G) < 00. 

For i? G M+, let X{R) denote the largest eigenvalue (the spectral radius). 
We know that X{R) G M_|_ from Perron-Frobenius theory [34], Chapter 1. 
We use the following facts about X{R). The first is a restatement of [34], 
Theorem 1.6, a version of the Perron-Frobenius theorem. 

Lemma 7. If R £ is irreducible, there exists a row vector w > 
satisfying wR < fiw if and only if fi > X{R). If fj, = X{R), then w is the 
unique left eigenvector of R for the eigenvalue X. 

Lemma 8. If R £ M+ has blocks Ri, R2,---, Rk, then X{R) = 
maxi<i<fc A(i2j). 



Lemma 9 (See [34], Theorem 1.1). // R,R' G M+ and R < R' , then 
X{R)<X{R'). 



so axiom (i) in the definition of a matrix norm is violated by A(-). Never- 
theless, X{R) is a lower bound on the value of any norm of R. 

Lemma 10 (See [23], Theorem 5.6.9). If R e M+, then X{R) < \\R\\ for 
any matrix norm \\ ■ \\. 

Furthermore, for every R G there is a norm || • ||, depending on R, 
such that the value of this norm coincides with A(-) when evaluated at R. 

Lemma 11. For any irreducible R G M+, there exists a matrix norm \\ ■ \\ 
such that X{R) = \\R\\. 



A(-) is not a matrix norm. For example. 
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Proof. Let it; > be a left eigenvector for A = X{R), and let W 
diag(i(7) G M+. Then || • ||„, = \\W{ • is the required norm, since \\R\\u 



WRW- 



\wRW-^\\i=X\\wW-^\\i 



ll^lli 



A 1 



A. □ 



The norm || • defined in the proof of Lemma 11 is the Tninifnuni matrix 
norm for R, but this norm is clearly dependent upon R since w is. 

The numerical radius [23] of -R G is defined as z^(-R) = maxjx'^iJx : x'^x - 
1}. z^(-) is not submultiplicative since 













I) 




0^ 



1 

2' 



but applying u to the product gives 

1 




1. 



Thus, is not a matrix norm in our sense. Nevertheless, v{R) provides a 
lower bound on the norm ||i?||2- 

Lemma 12. A(i?) < ^{R) < ll-Rib; with equality throughout if R is sym- 
metric. 



Proof. Let w, with \\w\\2 = 1, be an eigenvector for A = X{R). Then 
^{R) ^ Rw = Xuj^w = X{R). Also, i'{R) = x'^Rx < \\R\\2 for some x with 
||2;||2 = 1, and x^Rx = \\x'^Rx\\2 < ||-R||2 since || • ||2 is submultiplicative. If R 
is a symmetric matrix, then R = Q^AQ, for Q orthonormal and A a diagonal 
matrix of eigenvalues. Then ||ii||2 



z/(i?Tij) = j,(A2) = A(i?)2. □ 



Thus, when R is symmetric, we have A(i?) = ||-R||2) and hence || • ||2 is the 
minimum matrix norm, uniformly for all symmetric R. However, when R is 
not symmetric, ||i?||2/A(i?) can be arbitrarily large, even though < R < J . 
Consider, for example. 



R 



l-2e 

e 



for any < e < ^. Then X{R) < -y/e + e, and ||i?||2 > 1 — 2e, so limg^o ll-^lb/ 
X{R) = oo. Also, II • II2 is not necessarily the minimum norm for asymmetric 
R. We always have ||i?||2 < V^ll-^lli ([^3], page 314), but this bound can 
almost be achieved for < i? < J. Consider 



R 



1 — ne 

£ 



1 — ne 

e 



1 — ne 
e 



1 — ne 

e 
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for any < e < ^. Then ||-R||i = 1 — e, but ||i2||2 > {l — ne)^/n, so linig^^o ll-^lb/ 
||i?||i = y/n. On the other hand, || • ||2 does have the following minimality 
property. 



Lemma 13. For any matrix norm \\ ■ \\, \\R\\2 ^ v^||-R|| 

Proof. \\R\\l = X{R^R) < \\R'^R\\ < = using Lem- 

mas 10 and 12. □ 

For a matrix norm || • ||, the quantities J„ = ||J||, for J G M„ and Cn = 
||1||||1||*, will be used below. We collect some of its properties here. In par- 
ticular, Jn = n for II - 111, II • 1^5 ll'lloo and the Frobenius norm, by direct 
calculation. More generally, 

Lemma 14. Let \\ ■ \\ be a matrix norm. Then: 

(i) if J* = II J||*, then J* = Jn; 

(ii) n<Jn< Cn.; 

(iii) II • II is an operator norm, then Jn = Cn; 

(iv) II • II is induced by a vector norm which is symmetric in the coor- 
dinates, then Jn = n; 

i^) ^/ II ■ lip ^-^ induced by the vector p-norm (1 <p<oo), then Jn = n; 
(vi) ifW ■ \\w = ||VF-l^~-'^||i, where W = diag(tt;) for a column vector w > 
with \\w\\i = I, then Jn = l/wmin; where Wmm = minjt(;j. 

Proof. We have: 

(i) J* = ||J||* = ||JT|| = ||J|| = J„. 

(ii) J = ll"*" so nl = Jl. Thus ra||l|| < || J|| ||1||. Now ||1|| 0, so cancel- 
lation gives the first inequality. The second follows by submultiplicativity 
and duality. 

(iii) Jn = ||J|| = llll'^ll = niax^^o l|ll'^3;||/||x||, where x is a length-n vec- 
tor. Pulling scalar multiples out of the vector norm in the numerator, this is 
equal to ||1|| maXa;^o |l'^^l/ll^ll- Now by Lemma 5, ||1|| = ||1|| ||1||, and hence 
Jn = ||1|| max^^o l|l'^a:||/||x|| = ||l||||l'^|| = Cn- 

(iv) Let X be any column vector such that l^x = n. Let x^ be x after 
a coordinate permutation cr, and x = J2a^^/^- Clearly, x = 1. Also, ||x|| < 
||x||, and I'^x = l^x = n by subadditivity of || • || and symmetry, so ||1||* = 
maXj-^o l'^2;/||x|| < maXj-^o I'^^/ll^^ll = ^/l|l||- 

(v) This follows directly from (iv). 

(vi) Jn = \\J\\w = \\Z\\i, where Zij = Wi/wj. Thus, Jn = EiLi^*/ 
miuj = l/i(;niiiT □ 
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Remark 1. For an arbitrary matrix norm, we can have Cn> Jn- This 
is true even if the norm is invariant under row and column permutations. 
For example, || • || = max{|| • || • ||oo} is a matrix norm, with ||J|| = ||1|| = 
||1||* = n, which even satisfies ||/|| = 1 (see [23], page 308). For this norm, 
Cn/ Jn = n-. In general, the ratio is unbounded, even for a fixed n. Consider, 
for example, || • || = max{||VF • 111; ■ \\oa}, where W = diag(v) for 
a column vector f > with ||f ||i = 1. It is easy to show that this is a matrix 
norm with C„,/J„ =maxjfj/ minjUj, which can be arbitrarily large. 

We will use the following technical lemma, which appears as Lemma 9 
in [14] for the norm || • ||i. We show that, for any nonnegative matrix R with 
||i2|| < 1, there is a row vector w which approximately satisfies the condition 
of Lemma 7, and has tfmin not too small. 

Lemma 15. Let R £ M^, and let \\ • || be a matrix norm such that \\R\\ < 
/i < 1. Then for any < rj < 1 — fx, there is a matrix R' > R and a row 
vector w > such that wR' < fJ^'w, \\w\\oo = 1 and Wmm = miniWi > rj/ Jn, 
where ^' = fi + rj <1. 

Proof. Let J' = J/J„, and R' = R + r]J' . Then R' is irreducible, and 
1^ +V- Then by Lemma 10, \{R') < ^J- + rj = fj,' . Thus, by Lemma 7, 
there exists w > such that wR' < fj,'w. We normalize so that ||oo = 1- 
Then w > fi'w > wR' > rjwJ' = t]!^ / Jn, and hence ifmin > v/ Jn- D 

2.2. Random update and systematic scan Glauber dynamics. The frame- 
work and notation is from [14, 15]. The set of sites of the spin system will be 
y = [n] = {1, 2, . . . , n}, and the set of spins will be S = [g]. A configuration 
(or state) is an assignment of a spin to each site, and fi"*" = S" denotes the 
set of all such configurations. Let M = = = \^~^\, and we will suppose 

n+ = [M]. 

Local interaction between sites specifies the relative likelihood of possible 
(local) subconfigurations. Taken together, these give a well-defined prob- 
ability distribution vr on the set of configurations 0"*". Glauber dynamics 
is a Markov chain (xt) on configurations that updates spins one site at a 
time, and converges to vr. We measure the convergence of this chain by the 
total variation distance dTv(T)- We will abuse notation to write, for ex- 
ample, dTY^xtjir) rather than d^y {C{xt) , tt) . The mixing time T(e) is then 
defined by r(e) = mmt{dTy{xt,7r) < e}. In our setting, n measures the size 
of configurations in 17+, and we presume it to be large. Thus, for conve- 
nience, we also use asymptotic bounds T(e), which have the property that 
limsup„^ooT(e)/f(e) < 1. 

We use the following notation. If x is a configuration and j is a site then 
Xj denotes the spin at site j in x. For each site j, Sj denotes the set of 
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pairs of configurations that agree off of site j. That is, Sj is the set of pairs 
(x, y) G X such that, for all i 7^ j, Xi = yi. For any state x and spin c, 
we use X c for the state y such that yi = Xi (z / j) and yj = c. For each 
site J, we have a transition matrix Pl-'l on the state space VL'^ which satisfies 
two properties: 

(i) P^l Chan ges one configuration to another by updating only the spin 
at site j. That is, if P^^\x,y) > 0, then (x,y) G Sj. 

(ii) The equilibrium distribution vr is invariant with respect to Pt-'' . That 
is, vrPl-'l = vr. 

Random update Glauber dynamics corresponds to a Markov chain A^"^ with 
state space and transition matrix P'l" = (1/n) X]j=i Systematic scan 
Glauber dynamics corresponds to a Markov chain with state space 0+ 
and transition matrix = Y[j=i P'-'^- 

It is well known that the mixing times Tr(e) of and Ts(e) of A^— > 
can be bounded in terms of the influences of sites on each other. To be 
more precise, let fij{x,-) be the distribution on spins at site j induced by 
P^^\x,-). Thus, iJ,j{x,c) = pt-'] (x, x — c). Now let Qij be the influence of 
site i on site j, which is given by Qij = max(a, dTv(Mj(^) " )> ' ))• A 
dependency matrix for the spin system is any nxn matrix R = (Qij) such 
that Qij > Qij . Clearly, we may assume Qij < 1 . 

Given a dependency matrix R, let Qj denote the jth column of R, for 
j € [n]. Now let Rj G M+ be the matrix which is an identity except for 
column j, which is Qj, that is, 

(1) {Rj)ik = I Qij, if k = j; 

[ 0, otherwise. 

Let R^ = ^ Si=i Rj ~ ^^^^ + define the random update matrix for P, 
and let R = R1R2 ■ ■ ■ Rn define the scan update matrix for R. 

2.3. T/ie applicability of Lemmas 1 and 2. In this section, we give an 
example of a family of spin systems for which Lemmas 1 and 2 can be used 
to prove rapid mixing, while previous theorems are inapplicable. 

Facilitated spin models (see [5]) are a class of spin systems in which each 
spin is either resampled from its equilibrium distribution or is not resam- 
pled, depending on whether the surrounding configuration satisfies a local 
constraint. Consider the following variant of a facilitated spin model on 
n sites. On each step of the dynamics, a site j is chosen uniformly at ran- 
dom. The spin at the site is sampled from its equilibrium distribution, which 
is the uniform distribution on {0, 1}, except that, if any of sites j — 2, j — 1, 
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or j + 1 has spin 1, then the resamphng only occurs with probabihty 5 for 
some 6 G (0, 1). 

Let M be the n x n matrix which has a 1 in entries — 1), + 1), 
and + 2) (for i G {1, . . . and in all other entries. The dependency 
matrix R of the spin system is ^-^M. 

Now suppose, for example, that for n > 15, we choose 5 = ^ — ^j^_2) ■ For 
these parameters, we will show that the Li, L^o and spectral norms of R 
are all at least 1 (and the Loo norm exeeds 1) but \{R) < 1. We can draw 
the following conclusions. 

• Since the Li norm of R is at least 1, the Dobrushin-condition methods of 
[11, 19, 35, 39] cannot be used to show that random update Glauber 
dynamics or systematic scan Glauber dynamics are rapidly mixing for 
this spin system. 

• Since the L^o norm of R exceeds 1, the methods of [4, 14] cannot be 
used to show that random update Glauber dynamics or systematic scan 
Glauber dynamics is rapidly mixing. 

• Since the spectral norm of R is at least 1, the methods of [22] are not 
applicable. 

• However, since X{R) < 1, by Lemma 11, there is a norm || • || with ||i?|| < 1 
and Lemmas 1 and 2 can be used to show rapid mixing of both random 
update Glauber dynamics and systematic scan Glauber dynamics. 

Here is a proof that the Li, Loo and spectral norms of R are all at least 1 
(and the Lqo norm exceeds 1) but \{R) < 1 (as claimed above). 

Let 6 = 2/(1 — 5) = 3(1 — 2/re). Each norm of R is the corresponding norm 
of M divided by 6, so we wish to show that the Li, L^o, and spectral norms 
of M are at least b, but that A (A/) < b. 

The Li and Lqo norms are easy, so start with the spectral norm ||M||2 = 
y/MP), where P denotes M^M. Since P is symmetric, by Lemma 12, 
A(P) = I'iP)- Let X be the length-n vector in which every entry is Xj ^Jn. 
Then v(P) > x^Px = {l/n)J2i,j Pij = (l/n)(9n - 18) = 9 - 18/n. Thus, 
||M||2 > 3(1 - 2/n)^/^ > b. 

Finally, we wish to show A(M) < b. By Lemma 7, it suffices to find w > 
satisfying Mw < ^w. This will imply A(M) < ^. We will take /i = 2.62 which 
is less than b for n > 15. Let x = 1.525 and wj = x~^ . Then the ith row of 
Mw is at most 

-if 1 1 \ 

= x xH h^ < Wi^i, 

V X x^ I 



so we are finished. 
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3. Mixing conditions for Glauber dynamics. There are two approaches 
to proving mixing results based on the dependency matrix, path coupling 
and Dobrushin uniqueness. These are, in a certain sense, dual to each other. 
All the results given here can be derived equally well using either approach, 
as we will show. 

3.1. Path coupling. First, consider applying path coupling to the random 
update Glauber dynamics. We will begin by proving a simple property of 

Lemma 16. Let R be a dependency matrix for a spin system, and \\ ■ \\ 
any operator norm such that \\R\\ < fi < 1. Then \\R^\\ < where = 
l-i(l-/x)<l. 

Proof. ||i?t|| < + = + < i_ = ^,t. □ 

We can use this to bound the mixing time of the random update Glauber 
dynamics. 

Lemma 17. Suppose R is a dependency matrix for a spin system, and 
let \\-\\ be any matrix norm. // < /i < 1, then the mixing time Tr{£) of 
random update Glauber dynamics is at most n{l — /i)^^ ln(C„/e). 

Proof. We will use path coupling. See, for example, [18]. Let xo,yo G 
be the initial configurations of the coupled chains, and xt,yt the states 
after t steps of the coupling. The path ZQ,...,Zn from xt to yt has states 
zq = xt, and Zi = {zi-i yt{i)) (i £ [n]), so z„ = yt. 

We define a distance metric between pairs of configurations as follows. 
For every i G [n], we choose a constant < 5j < 1, and we define the distance 
between configurations in Si to be 6i. That is, for every {x,y) £ Si, we define 
ds{x,y) = 6i. We then lift these distances to a path metric. In particular, for 
every pair of configurations {x,y), ds{x,y) =J2i'=i^i^{^i'^) i^ui^)}- The 5i 
{i G [n]) make up a column vector 5 > 0. Note that di(-,-) is the usual 
Hamming distance. 

Following the path-coupling paradigm, we now define a coupling of one 
step for each pair of starting states in Si (for every i £ [n]). This gives us 
a coupling of one step for every pair (zj,Zi+i) in the path between xt and 
yt and these can be composed to obtain a coupling of one step from the 
starting pair {xt,yt)- 

The coupling will be to make the same vertex choice for all {xt,yt) S Si 
and then maximally couple the spin choices. With this coupling, Qij bounds 
the probability of creating a disagreement at site j for any {xt,yt) G Si and 
time t. 
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Now consider an arbitrary pair of configurations {xt,yt)- Let /3f(i) = 
Pr(xt(i) ^ yt{i)) determine a row vector I3t, so E,[ds{xt,yt)] = f3tS. Clearly, 
< Pt ^ l"""- Since and Pr(xj+i = x, yt+i = y \ xt,yt) are independent, 
it follows that 

(2) /3i+i<5 = E[d5(xj+i,ym)] < Y.^mk-^-rY.^\=m^L 

\ n n I 

1=1 \ j=i / 

[The iih term in the sum comes from considering how the distance between 
Zi-i and Zi changes under the coupling. Assuming Zi-i and Zi differ (at site i) 
then 5i is the reduction in distance that comes about by updating site i and 
removing the disagreement there, while SjQij is the expected increase in 
distance that arises when site j is updated and a disagreement is created 
there.] Now equation (2) holds for all 5 with < 5j < 1. In particular, for 
any e, it holds for any vector 6 in which one component is 1 and the other 
components are £. Taking the limit, as e ^ 0, we find that componentwise, 

(3) (3t+i<(3tR\ 
Now, using (3) and induction on t, we find that 

(4) A+i</3oi2^*^'. 

Equation (4) imphes fit+il < /9o-R^*^^l- Using the couphng lemma [13, 29], 

n 

dTv(a;t, yt) < Pi-(xt ^yt)<Y^ Pr(xt(i) / yt{i)) 

4 = 1 

Now applying Lemma 4 with c = l^R^^l and using submultiplicativity 
[property (iv) of matrix norms], 

l^ijt*! < ||l||||i2t||«||iT||^c'^pt||t_ 
But 1 1 fit 1 1 < = 1 _ ( 1 _ ^) by Lemma 16 . Thus , when t > n ( 1 - //) " ^ ln(C„ /e) , 
dTv(xf,yt) < a/ = C„(l-(l-M)/n)* < ae-*(^-^)/" < s. □ 

Corollary 18. Let R be a dependency matrix for a spin system. Then 
the mixing time Trie) of random update Glauber dynamics is at most n{l — 
/i)^^ln(n/e) if R satisfies any of the following: 

(i) the Dobrushin condition a=||i?||i<^<l; 

(ii) the Dobrushin- Shlosman condition a' = ||-R||oo ^ ^ < 1/ 

(iii) a p-norm condition < /_f < 1 for any 1 < p < oo. 
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Proof. This follows directly from Lemma 17, since C„ = = n for 
these norms, by Lemma 14. □ 

Corollary 19. Let R be a dependency matrix for a spin system. Sup- 
pose either of the following conditions holds: 

(i) w > is a row vector such that wR < fxw, ||u^||oo = 1 CLud i«min = 
mini Wi; 

(ii) w > is a column vector such that Rw < jiw, = 1 and 'ifmin = 
minj Wi . 

Then the mixing time Tr{e) of random update Glauber dynamics is at most 
n(l - /i)"Mn(l/i(;mine). 

Proof. Both are proved similarly, using Lemma 17 with a suitable op- 
erator norm, so Cn = Jn- 

(i) Let = diag(tt;) define the norm ||i?||^ = Then < 
/i, and Jn = l/'u^min by Lemma 14. 

(ii) Let W = diag(w) define the norm \\R\\^ = \\W-^RW\\oo = \\WR^W-^\\i. 
Then \\R\\w < A*, and Jn = l/t(^mm by Lemma 14. □ 

In the setting of Corollary 19 (i), we can also show contraction of the 
associated metric d^(-,-). 

Lemma 20. Suppose R is a dependency matrix for a spin system, and 
let w > be a column vector such that Rw < fiw. Then E[du,(xj+i, y^+i)] < 
/i'''E[d^(xt,yt)] for all t > 0. 

Proof. Note that R^^w = ^w + ^Rw < + ^n)w = fi^w. Putting 
6 = w in (2), 

E[d^(xt+i, yt+i)] = Pt+iw < PtR^w < ^j) (3tw = ^i^¥.[dw{xt,yt)]- □ 

Remark 2. We may be able to use Lemma 20 obtain a polynomial 
mixing time in the "equality case" //^ = 1 of path coupling. However, it is 
difficult to give a general result other than in "soft core" systems, where 
all spins can be used to update all sites in every configuration. See [3] for 
details. We will not pursue this here, however. Note that mixing for the 
equality case apparently cannot be obtained from the Dobrushin analysis of 
Section 3.2. This is perhaps the most significant difference between the two 
approaches. 

We would like to use an eigenvector in Corollary 19, since then ^ = \{R) < 
||i?|| for any norm. An important observation is that we cannot necessarily 
do this because R may not be irreducible (so w^am may be 0) or Wmin may 
simply be too small. 
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f 0.1, l<i<n-2, j-- 

0.4, 3 < i < n, j = i- 

0.8, i = 2, j = l; 

0.2, i = n — 1, j =n; 

0, otherwise. 



i + 1; 



Fig. 1. Example 1. 



Example 1. Consider the matrix of Figure 1. Here R is irreducible, 
with \{R) = 0.4 and left eigenvector w such that Wi oc 2~* (i G Thus, 
''^min < Wn/wi = 2^~" is exponentially small, and Corollary 19(i) would give 
a mixing time estimate of 0(n^) site updates. In fact, R satisfies the Do- 
brushin condition with a = 0.8 and the Dobrushin-Shlosman condition with 
a' = 0.9, so we know mixing actually occurs in 0{n\ogri) updates. 

However, if we know ||i?|| < 1 for any norm || • ||, we can use Lemma 15 to 
create a better lower bound on ifmin- We apply this observation as follows. 

Corollary 21. Let R be a dependency matrix for a spin system, and let 
II • II be any matrix norm. Suppose \\R\\ < p <1. Then for any < rj < 1 — p, 
the mixing time of random update Glauber dynamics is bounded by 

Tr(e) <n{l- p- 7])"^ ln{Jn/rie). 

Proof. Choose < rj < 1 — p. Let R' be the matrix from Lemma 15. 
Since R' > R, it is a dependency matrix for the spin system. Let w be the 
vector from Lemma 15. Now by Corollary 19, the mixing time is bounded 
by rr(e) <n(l-/i')~^ln(l/w min£^)- where Wmin ^ ??/ Jn and p' = p -\- rj. □ 

From this we can now prove Lemma 1, which is a strengthening of Lemma 17 
for an arbitrary norm. 
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Lemma 1. Let R be a dependency matrix for a spin system, and let \\ ■ \\ 
be any matrix norm such that \\R\\ < fj. <1. Then the mixing time of random 
update Glauber dynamics is bounded by 

Trie) - n(l - f,)-^ ln((l - ^i)" 

Proof. Choose r]= (1 — /x)/ Inn. Substituting this into the mixing time 
from Corollary 21 now implies the conclusion, since Jn^n. □ 

Remark 3. The mixing time estimate is fr (e) ~ n{l - ^)-Mn((l - 
/^)~^<^n./£)- If (1 — Ai) is not too small, for example, if (1 — /i) = r2(log^^n) 
for any constant A; > 0, we have fr(e) ~ n(l — ln{ Jn/e). Thus, we lose 
little asymptotically using Lemma 1, which holds for an arbitrary matrix 
norm, from the mixing time estimate fr(e) = n(l — fj.)~^ ln(J„/e), which re- 
sults from applying Corollary 17 with an operator norm || • ||. The condition 
(1 — /i) = r2(log~'^n) holds, for example, when (1 — ^) is a small positive 
constant, which is the case in many applications. 

We can easily extend the analysis above to deal with systematic scan. 
Here the mixing time rs(e) will be bounded as a number of complete scans. 
The number of individual Glauber steps is then n times this quantity. The 
following lemma modifies the proof technique of [14], Section 7. 

Lemma 22. Let R be a dependency matrix for a spin system, and \\ ■ \\ 
any matrix norm. If \\R\\ < /i < 1, the mixing time Ts{e) of systematic scan 
Glauber dynamics is at most (1 — ln(C„/e). -(/ || • || is an operator norm, 
the mixing time is at most (1 — ^u)""*^ ln(J„/e). 

Proof. We use the same notation and proof method as in Lemma 17. 
Consider an application of P^^\ with associated matrix Rj, as defined in (1). 
Then it follows that 

n 
i=l 

If as before, 5i = l and 5j — > for j / i, we have Pr(xi(i) / yi{i)) < PoRj- 
Now it follows that E[d(x„,y„)] < /^odlLi Rj)^ = PoR^ and E[d{xnt,ynt)] < 
PoR'6. Thus, Fv{xnt{i) + Vntii)) < PoR\i). Hence, 

n 
1=1 

</3o-R*l < < ||^||*||l'^||||l||. 

The remainder of the proof is now similar to Lemma 17. □ 
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The following lemma was proved in a slightly different form in [14, Lemma 11]. 
It establishes the key relationship between R and R. 

Lemma 23. Let R be a dependency matrix for a spin system. Suppose 
w > is a row vector, such that wR < fiw for some /-f < 1. Then wR < fiw. 

Proof. Note that for any row vector z, zRi = [zi • • • zi^iZQiZi+i • • • Zn]- 
Since wR < <w, wgi < Wi. Now we can show by induction on i that 
wRi ■ ■ ■ Ri < [wQi ■ ■ ■ WQiWi+i ■ ■ ■ For the inductive step, wRi ■ ■ ■ Ri < 
zRi = [zi - • • Zi-iZQiZi+i ■ ■ ■ Zn\ where z = [wqi ■ ■ ■ WQi-iWi ■ ■ ■ Wn\ ■ But then 
z <w, so ZQi < WQi SO zRi < [wgi • ■ ■ WQiWi+i ■ ■ ■ Wn] ■ Taking i = n, we have 

wR < [wgi ■ ■ ■ WQn] = wR < l-LW. □ 

Corollary 24. A(-R) < X{R) and if \\R\\i < I, \\R\\i < \\R\\i. 

Proof. The first statement follows directly from Lemmas 7 and 23. For 
the second, note that l'^i?< so 1^R< IliJllil"^ by Lemma 23. But 

this implies ||-R||i < D 

We can now apply this to the mixing of systematic scan. First we show, 
as in [35], that the Dobrushin criterion implies rapid mixing. 

Corollary 25. Let R be a dependency matrix for a spin system. Then 
if R satisfies the Dobrushin condition a = \\R\\i < /-i < 1, the mixing time of 
systematic scan Glauber dynamics is at most (1 — fi)~^ln{n/e). 

Proof. This follows from Lemma 22 and Corollary 24, since Jn = n for 
the norm || • □ 

Next we show, as in [14], Section 3.3, that a weighted Dobrushin criterion 
implies rapid mixing. 

Corollary 26. Let R be a dependency matrix for a spin system. Sup- 
pose w > is a row vector satisfying ||w||oo = 1 o.'iT'd wR < fiw for some 
/i < 1. Let Wmin = minjtyj. Then the mixing time Tg^e) of systematic scan 
Glauber dynamics is bounded by (1 — fj,)~^ ln{l/ Wmin^) ■ 

Proof. By Lemma 23, wR < fxw. We use the norm || • ||^ = || Vl^ • 
where W = diag(«;). Then apply Lemma 22 with < /i. □ 



Once again, we cannot necessarily apply Corollary 26 directly since tfmin 
may be too small (or even 0). Applying Corollary 26 to Example 1 would give 
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a mixing time estimate of 0(n) scans. However, R satisfies the Dobrushin 
condition with a = 0.8 so we know mixing actually occurs in O(logn) scans. 
Once again, our solution is to perturb R using Lemma 15. 

Corollary 27. Let R be a dependency matrix for a spin system and 
II • II a matrix norm such that \\R\\ < /U < 1. Then for any < r/ < 1 — /x, the 
mixing time of systematic scan Glauber dynamics is bounded by 

Ts{e)<{l-fi-7]yhn{Jn/7]e). 

Proof. Let R' be the matrix and w the vector from Lemma 15. Since 
R' > R, it is a dependency matrix for the spin system. Now by Corollary 26, 
the mixing time satisfies Tg (e) < {1 — fi')~^ln{l/wmins), where Wmin = miuj > 
T]/ Jn and fi' = ii + rj. □ 

We can now use this to prove Lemma 2. 

Lemma 2. Let R be a dependency matrix for a spin system and \\ ■ \\ any 
matrix norm such that \\R\\ < fi < 1. Then the mixing time of systematic 
scan Glauber dynamics is bounded by 

fs(e) ~ (1 - ln((l - 

Proof. This follows from Corollary 27 exactly as Lemma 1 follows from 
Corohary 21. □ 

Remark 4. If, for example, || • || = || • ||p, for any 1 < p < oo, J„ = n, and 
we obtain a mixing time 'fs(e) ~ (1 — ^)~^ln((l — fi)'~^n/e). If in addition, 
(1 — /u) = r2(log~'^ n) for any k >0 (as in Remark 3), we have fs{e) ~ (1 — 
//)~^ ln(n/e), which matches the bound from Corollary 25 for the norm || • ||i. 
Note that there is a difference from the random update case, since here we 
do not have a result like Lemma 17 which we can apply directly with any 
operator norm. 

3.2. Dobrushin uniqueness. The natural view of path coupling in this 
setting corresponds to multiplying i?^ on the left by a row vector /3, as in 
Lemma 17. The Dobrushin uniqueness approach corresponds to multiply- 
ing R on the right by a column vector 5. As we showed in [14], Section 7, 
these two approaches are essentially equivalent. However, for historical rea- 
sons, the Dobrushin uniqueness approach is frequently used in the statistical 
physics literature. See, for example, [33, 35]. Therefore, for completeness, we 
will now describe the Dobrushin uniqueness framework, using the notation 
of [14]. 
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Recall that Q"^ = [M]. For any column vector / G R^^ , let 6i{f) = 
max(^^j^)g5j/(2;) - f{y)\. Let 6{f) be the column vector given by 6{f) = 

(5i(/),(^2(/), • ■ • ,<5n(/))- Thus, 5:M*^ ^ M". The following lemma gives the 
key property of this function. 

Lemma 28 ([14], Lemma 10). The function 6 satisfies 6{P^^f) < RjS{f). 

Proof. Suppose G Si maximizes \P^^^f{x) - P^^^f{y)\. Then 



= f{^^'c)P^'^ (x, x^^c) -J2f{y c)pb] (y, y ^= c) 

c c 
c c 

= ^(/(x^^c)-/(y-^c))/i,(x,c) 

c 

c 

<El/(^^'c)-/(?/-'c)|/i,(x,c) 



+ 



We will bound the two terms in the last expression separately. First, 
J2\f(x^^c)-fiy^U)\fi,{x,c) 



(5) 



< max\f{x -^^ c) - f{y c)\ < l,^j5,{f). 



For the second, let f'^ = maxc/(y -^^ c), f = minc/(y -^^ c) and = 
+ /-). Note that /+ - /o =!(/+-/-) < ^S.if). Then since 
X;^(/ij(x,c) -//j(y,c)) = 0, 



E(/(y-'c)-/°) (M,(x,c)-/i,(y,c)) 
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(6) < 2drY{fXj{x,-),fj,j{y,-))ma-K\f{y^^ c) - f\ 

< QijSjif)- 

The conclusion now follows by adding (5) and (6). □ 

The following lemma allows us to apply Lemma 28 to bound mixing times. 

Lemma 29. Let M = (Xt) be a Markov chain with transition matrix P, 
and II • II a matrix norm. Suppose there is a matrix R such that, for any 
column vector f G , 5{Pf) < R5{f), and \\R\\ < < 1. Then the mixing 
time of M is hounded by 

r{e)<{l-fi)-hn{Cn/e) 

Proof. For a column vector /o, let ft be the column vector ft = -P* Jo- 
Let TT be the row vector corresponding to the stationary distribution of M . 
Note that irft = nP^fo, which is vr/o since vr is a left eigenvector of P with 
eigenvalue 1. 

Now let /o be the indicator vector for an arbitrary subset A of [M] = 
. That is, let fQ{z) = l ii z £ A and fo{z) = otherwise. Then since 
P^{x,y) = Pr(Xj = y\XQ = x), we have ft{x) = Pr(Xt £ A\xq = x). Also, 
T^ft = tt/o = vr(yl) for aU t. Let /j" = min^ ft{z) and /j+ = max^ ft{z). Since 
TT is a probability distribution, ff < irft < ft~, so ff < vr(yl) < f^'. 

By induction on t, using the condition in the statement of the lemma, we 
have 5{ft) < R*'5{fo). But i?*(5(/o) < -R*l. Now, consider states x,y such that 
ft{x) = ff , ft{y) = ft ■ Let Zi (i = 0, 1, . . . , n) be the path of states from x 
to y used in the proof of Lemma 17. Then 

n 

ft - ft = ft{y) - ftix) < \ftiz,) - ftiz,.,)\ 

i=l 

n 

<Y.^i{ft) = -^^5{ft)<-^^R'i- 

i=l 

This implies that max^- 1 Pr(xt £ A\xq = x) — vr(A)| < R*\. Since A is 
arbitrary, for all t>{l — ln(C„/e) we have 

dTYixt,n) < l^R^l < ||i?||*||l||||l|r 

= C„||i?||*<CV<C„e-(^-^)*<e. □ 

The following lemma and Lemma 17, whose proof follows, enable us to use 
Lemma 29 to bound the mixing time of random update Glauber dynamics. 
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Lemma 30. Let R he a dependency matrix for a spin system Let R^ be 
the random update matrix for R. Then for f G , 5{P'^ f) < R'^S{f). 

Proof. For each i G [n], from the definition of 5i, 5i{f) > and, for any 
c e R and / G , 5i{cf) = \c\5i{f). Also, 5i{fi + /a) < 5i{fi) + 6^{f2) for 
any/i,/2GM*^. Now, 

\ j=i / \j=i / i=i 

By Lemma 28, this is at most \ J2]=i RjS{f) = R^5{f). □ 

Remark 5. The proof shows that Si{f) is a (vector) seminorm for all 
i £ [n]. It fails to be a norm because Si{f) = does not imply / = 0. For 
example, (5j(l) = for alH G [n]. 

We can now give a proof of Lemma 17 using this approach. 

Lemma 17. Suppose R is a dependency matrix for a spin system, and 
let \\ ■ \\ be any matrix norm. // ||i2|| < /_i < 1, then the mixing time Tr{£) of 
random update Glauber dynamics is at most n{l — ln(Cn/e). 

Proof. By Lemma 16, WR^W < fJ = I — ^{l — fi) and by Lemma 30, 
6{P^f) < R^5{f). Then by Lemma 29, rr(e) <"(1 - n^y^ln{Cn/e) = n{l - 
firHn{CJe). □ 

Corollaries 18 and 19 and the rest of that section now follow exactly as 
before. A similar analysis applies to systematic scan, though it is slightly 
easier. It relies on the analogue of Lemma 30, which in this case is immediate 
from Lemma 28. 

Lemma 31. Let R be a dependency matrix for a spin system. Let R be 
the scan update matrix for R. Then for any f G M^^, 5{P f) < R5{f). □ 

We can now give a proof of Lemma 22 using this approach. 

Lemma 19. Let R be a dependency matrix for a spin system, and \\ ■ \\ 
any matrix norm. If \\R\\ < fx < 1 , the mixing time Ts{£) of systematic scan 
Glauber dynamics is at most (1 — fi)^^ ln(C„/e). // || • || is an operator norm, 
the mixing time is at most (1 — /x)""*^ ln(J„/e). 

Proof. By Lemma 31, for any / G M^" , 6{Pf) < R6{f). Then by as- 
sumption, ||i?|| < /U < 1. Now apply Lemma 29. □ 

The results following Lemma 22 in Section 3.1 can then be obtained iden- 
tically to the proofs given there. 
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3.3. Improved analysis of systematic scan. We may improve the analysis 
of Corollary 27 for the case in which the diagonal of R is 0, which is the 
case for the heat bath dynamics. For <t > 0, define by 

j^a^fo-Rij, ifl<i<j <n; 

otherwise, 

so R"^ has its upper triangle scaled by a. Let Qj denote the jth column of 
R"^ , for j E [n]. We can now prove the following strengthening of Lemma 23. 

Lemma 32. // wR^ < aw, for some w >0 and < a < 1, then wR < 
wR''. 

Proof. We prove by induction that 

WR1R2 ■■■Ri < [wQi ■ ■ ■ WqI Wi+l ■■■ Wn] 

< [awi ■ ■ ■ awi Wi+i ■ ■ ■ Wn]- 

The second inequality follows by assumption. The hypothesis is clearly true 
for i = 0. For i > 0, 

wRiR2 - ■ ■ Ri-lRi < [wqI ■■■ WqI^i Wi Wi+i ■■■ Wn]Ri 
= [wqI ■■■ WqI^i WQi Wi^i ■■■ Wn], 

where w = [wgl ■■■ wgf^i Wi ■■■ w.n\ < [crwi ■■■ awi-i Wi ■■■ w,n\. It 
follows that WQi < wgf , continuing the induction. Putting i = n gives the 
conclusion. □ 

Lemma 33. // R is symmetric and A = X{R) < 1 then A(ii'^) < a if 
a = \/{2-X). 

Proof. We have A = y{R) = \\R\\2 by Lemma 12. Since R is symmetric 
with zero diagonal, x^R"x = \{l + a)x^Rx. It follows that XiR") < viR") = 
\{l + a)v{R) = \{l + a)X. Therefore, A(i?'') < a if A < 2c7/(l + a). This holds 
if (T> A/(2-A). □ 

Lemma 34. Let R he symmetric with zero diagonal and \\R\\2 = X{R) = 
A < 1, and < < 1 — A. Let fi = X + r] < 1. Then the mixing time of sys- 
tematic scan is at most 
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Proof. Let n' = n - 1 and S = R + ri{J - I)/n'. Since S > R, S is a 
dependency matrix for the original spin system. Also, S is symmetric and 
its diagonal is 0. Now 

A(5) = ||5||2 = \\R + r/(J - I)/n'\\2 < \\R\\2 + " AU/n' < A + r? = /i. 

Denote by 5 = SiS2---Sn the scan matrix. Let a = fi/{2 — fi). Now by 
Lemma 33, we have A(5"^) < a. Furthermore, S"^ is irreducible, so by Lemma 7, 
there exists a row vector w > satisfying wS°' < aw. We can assume with- 
out loss of generality that w is normalised so that HtfUoo = 1- Finally, we can 
conclude from Lemma 32 that wS < wS" . 

Since wS < wS" < aw, we have established that convergence is geometric 
with ratio a, but we need a lower bound on Wmm = Wmm in order to obtain 
an upper bound on mixing time via Lemma 29. Now 

aw > wS" > w{aR + cjr/(J - I)/n') 
> ar]w{J — I)/n = {ar]/n){l — w). 

So w{l + r]/n') > {r]/n')l, and u^min > v/ i^' + ^) ^/n. By Corollary 26, the 
mixing time satisfies 

Tsie) < (1 - ay^ ln(l/w;„,ine) < (1 - a)"^ ln(n/rye). □ 
We can now prove Lemma 3. 

Lemma 3. Let R be symmetric with zero diagonal and \\R\\2 = A(i?) = 
A < 1 . Then the mixing time of systematic scan is at most 

fs{e) ~ (1 - iA)(l - A)-^ ln((l - X)-\/e). 
Proof. We apply Lemma 34 with = (1 — A)/lnn, and hence /i ~ A. 

□ 

Remark 6. If, as in Remark 3, if (1 - A) = r2(log~''n) for some A; > 0, 
then we have mixing time fs(e) ~ (1 — ^A)(l — A)~^ln(n/e) for systematic 
scan. We may compare the number of Glauber steps nrs(e) with the estimate 
f^{e) = (1 — A)~^nln(n/e) for random update Glauber dynamics obtained 
from Corollary 18 using the minimum norm || • II2. The ratio is (1- ^A) < 1. 
This is close to | when X{R) is close to 1, as in many applications. 

Example 2. Consider coloring a A-regular graph with (2 A + 1) colors 
([24, 33]) using heat bath Glauber dynamics, we have \{R) = A/(A + 1). 
(See Section 4). Then (1 - A) = 1/(A + 1) = 0(1), if A = 0(1), and the above 
ratio is (1 - ^A) = (A + 2)/(2A + 2). This is close to i for large A. 
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Although the improvement in the mixing time bound is a modest con- 
stant factor, this provides some evidence in support of the conjecture that 
systematic scan mixes faster than random update, for Glauber dynamics at 
least. The improvement is because we know, later in the scan, that most 
vertices have already been updated. In a random update, some vertices are 
updated many times before others are updated at all. Lemma 34 suggests 
that this may be wasteful. 

4. Coloring sparse graphs. In this section, we consider an application 
of the methods developed above to graph coloring problems, particularly in 
sparse graphs. By sparse, we will mean here that the number of edges of the 
graph is at most linear in its number of vertices. 

Let G = {V,E), with F = [n], be an undirected (simple) graph or multi- 
graph, without self-loops. Then dv will denote degree of vertex v €V. If 
5 C y, we will denote the induced subgraph by Gs = {S,Es). The (sym- 
metric) adjacency matrix A{G) is a nonnegative integer matrix, with zero 
diagonal, giving the number of edges between each pair of vertices. We write 
A for A{G) and A(G) for X{A{G)). Thus, the adjacency matrix of a graph is a 
0-1 matrix. We also consider digraphs and directed multigraphs G = {V,E). 
We denote the indegree and outdegree of v £V hy d~ ,d^, respectively. 

If G is a graph with maximum degree A, we consider the heat bath 
Glauber dynamics for properly coloring V with q > A colors. The depen- 
dency matrix R for this chain satisfies Qij <l/{q — dj) [n]) (see Sec- 
tion 5.2 of [14]). Thus, R = AD, where D = diag(l/(g - dj)). Let L*^/^ ^ 
diag(l/^g - dj) and A = D^/'^AD^/'^. Note that A is symmetric. Also, \[A) = 
\{AD), since {0^/"^ AD^/'^)[D^ /'^x) = \[D^/'^x) if and only if ADx = Ax. If 
(i,j) G E, we have Aij = l/yj{q — di){q — dj). Since A < -^^A, we have 

X{A) < ^r^A(A) from Lemma 9. So if g > A -|- X{A), we can use Lemmas 1 
and 2 to show that scan and Glauber both mix rapidly. For very nonregular 
graphs, we may have A(^) <C '^^X{A). However, A(^) seems more difficult 
to estimate than A(^), since it depends more on the detailed structure of G. 
Therefore, we will use the bound ■^^^^X{A) in the remainder of this section, 
and restrict most of the discussion to X{G). The following is well known. 

Lemma 35. If G has maximum degree A and average degree d, then d < 
X{G) < A. If either hound is attained, there is equality throughout and G is 
A-regular. 

Proof. The vertex degrees of G are the row or column sums of A{G). 
The upper bound then follows from X{G) < \\A\\i = maxy^y dy = A using 
Lemma 10. For the lower bound, since G is undirected, A(G) = z^(j4) > 
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= 2\E\/n = d, using Lemma 12. If the lower bound is attained, then 
the inequahties in the previous line are equalities, so 1 is an eigenvector of A. 
Thus, A\ = dl, and every vertex has degree d = A. When the upper bound 
is attained, since the columns sums of A are at most A, 1^<A1 = A1, 
so 1 is an eigenvector from Lemma 7 and \A = Al. Then every vertex has 
degree A = d . □ 



Thus, the resulting bound for coloring will be g > 2A when G is A-regular, 
as already shown by Jerrum [24] or Salas and Sokal [33]. Thus, we can only 
achieve mixing for q < 2A by this approach if the degree sequence of G is 
nonregular. 

We now derive a bound on X{R) for symmetric R which is very simple, 
but nonetheless can be used to provide good estimates in some applications. 

Lemma 36. Suppose R G M^, and we have R = B + B'^ , for some B G 
Mn. /f II • II is any matrix norm, then X{R) < 2^||i3|| ||i3||* . 

Proof. X{R) = \\B + B'^y < \\B\\2 + {{B'^h = 2\\B\\2 < 2y^\\B\\\\B\\* , 
using the self-duality of || • ||2 and Lemmas 12 and 13. □ 



Corollary 37. IfR = B + B^ , then \{R) < 2^\\B\\i\\B 



We can use Corollary 37 as follows. If i? G M+, let Hi{R) = max/c[n] J2i,jei Qijl 
2\I\. We call k{R) the maximum density of R. Note that ii{R) > ^ maxjg[„] qh. 
Thus, the maximum density k{G) of A{G) for a graph or multigraph G = 
{V,E) is maxscv \Es\/\S\, according with common usage. This measure will 
be useful for sparse graphs. Note that the maximum density can be com- 
puted in polynomial time [21]. Note also that, for symmetric R G M^, the 
maximum density is a discrete version of the largest eigenvalue, since 

k{R) = max — =, — < max — =^ — = v[R) = XiR). 
x6{0,l}" x^x xeM" x^x 

Also, a{R) = \\R\\i > 2k,{R), since 

k{R) = max ^jj/2|/| < max 'S'] Qij/2\I\ 

< maxa(ii)|/|/2|/| = a{R)/2. 

/C[n] 



We may easily bound the maximum density for some classes of graphs. 
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For any ajbGZ, let us define Q{a, b) to be the maximal class of graphs such 
that: 

(i) G{a,b) is hereditary (closed under taking induced subgraphs); 

(ii) for all G = (V, E) £ g{a, b) with \ V\=n, we have \E\ <an-b. 

Lemma 38. Let G G g{a, b) with \V\ = n. If: 

(i) 6 > 0, then k{G) < a - b/n^ 

(ii) 6<0, /etr = a + i + y(a+^)2^^, then k{G) < k* = max{{[k* \ - 
l)/2,a-b/\k*]}. 

Proof. In case (i), clearly < a - b/n. U ScV, \Es\/\S\ <a- 

b/\S\ < a — b/n. In case (ii), note that k{G) < i( 2 ) = ^(re — 1) for any simple 
graph G on n vertices. Thus, 

k{G) < max min{(|5| - l)/2,a-6/|5|}. 

l<|S|<n 

Note that (s — l)/2 is increasing in s and a — b/s is decreasing in s. Also, 
s = k* is the positive solution to (s — l)/2 = a — b/s. The other solution is 
not positive since 6 < 0. Thus 

k{G) < max{([rj - l)/2,a-6/[fc*]} = k* . □ 

Remark 7. We could consider a more general class Q{an,bn), where 
l^nl = o(na„). This includes, for example, subgraphs of the d-dimensional 
hypercubic grid with vertex set V = [k]'^ in which each interior vertex has 
2d neighbors. Then \E\ < dn — dn^~^^'^, so an = d and bn = dv}~^/'^ . However, 
we will not pursue this further here. 

We can apply Lemma 38 directly to some classes of sparse graphs. 

For the definition of the tree-width t{G) of a graph G, see [10]. We say 
that a graph G has genus g if it can be embedded into a surface of genus g. 
See [7] for details, but note that that text (and several others) define the 
genus of the graph to be the smallest genus of all surfaces in which G can 
be embedded. We use our definition because it is appropriate for hereditary 
classes. Thus, for us a planar graph has genus 0, and a graph which can be 
embedded in the torus has genus 1 (whether or not it is planar). 

Lemma 39. If a graph G = (V, E) is: 

(i) a nonregular connected graph with maximum degree A, then G £ 
a(A/2,l); 

(ii) a forest, then G G ^(1, 1); 

(iii) a graph of tree-width t, then G G Q{t,t{t + l)/2); 
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(iv) a planar graph, then G € ^(3,6); 

(v) a graph of genus g, then G G ^(3,6(1 — g)). 

Proof. Note that (ii) is a special case of (iii), and (iv) is a special case 
of (v). For (i), if Gg = {S, Es) is an induced subgraph of G, then Gs cannot 
be A-regular, and |£'5| < ^\S\ — 1. For (iii) and (v), the graph properties 
of having tree-width at most t, or genus at most (7, are hereditary. Also, if 
= n, a graph of tree- width t has at most tn — t{t + l) /2 edges (see, e.g., [2], 
Theorem 1, Theorem 34), and a graph of genus g at most 3n — 6(1 — 5) edges 
(see, e.g., [7], Theorem 7.5, Corollary 7.9). □ 

Remark 8. In (i)-(iv) of Lemma 39, we have & > 0, but observe that 
in (v) we have 6 > if (7 = (planar) , 6 = if 5 = 1 (toroidal) and 6 < if 
g>l. 

Corollary 40. If a graph G= {V,E) on n vertices is: 

(i) a nonregular connected graph with maximum degree A, then k{G) < 
A _ 1 . 

2 n. ' 

(ii) a forest, then k{G) <! — ^; 

(iii) a graph of tree-width t, then k(G) <t — 

(iv) a planar graph, then k{G) < 3 — ^; 

(v) a graph of genus g > 0, let kg = ^ + \JV2g + ^, then 

k(G) <Kg = max{(LA;gJ -l)/2,3 + 6(<7-l)/rA;3l}. 
Proof. Follows directly from Lemmas 38 and 39. □ 

Remark 9. Suppose that g is chosen so that kg is an integer. The bound 
in Corollary 40(v) gives Hg = {kg — l)/2 (because kg is the point at which 
the two arguments to the maximum are equal). The bound says that for 
every graph G with genus g, k{G) < Kg. This bound is tight because there is 
a graph G with density n{G) = Kg and genus g. In particular, the complete 
graph has density Kg. U kg > 3, it also has genus g. The smallest genus 
of a surface in which it can be embedded is 7 = \{kg — 3){kg — 4)/12] (see, 
e.g., [7], Theorem 7.10). This is at least^ g since 

fc|-7V+12_ 



so the genus of G is 5 as required. The bound in Corollary 40(v) may not 
be tight for those g for which kg is not integral. However, the bound is not 
greatly in error. Consider any g > 0. The graph G = K^j^^^ can be embedded 



^In fact, 7 = (?, though we do not use this fact here. 
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Fig. 2. Upper and lower bounds on maximum density for small genus g. 

in a surface of genus g so it has genus g. Also, as noted above, k{G) 
^{[kg\ — 1). If the bound is not tight for this g and G then 



k{G) < Kg 



(7) 



3 + 
1 



6(5 



\kg 
[kg 



1) 6(o 



+ 



so Kg cannot be too much bigger than k{G). It is easy to see that Kg ~ -^/Sg 
for large g. For small g, a plot of the upper bound Kg on maximum density 
is shown in Figure 2, together with the lower bound ^([/cgj — !)• 

We now show that there exists a suitable B for applying Corollary 37. 

Lemma 41. Let R G be symmetric with maximum density k and let 
a = Then there exists B S such that R = B + B^ and \\B\\i = k, 

ll-Blloo = a- K. 



Proof. It will be sufficient to show that ||i?||i<K, ||i?||oo < a — since 
then we have 



(8) 



a 



\R\\i 



B + B'^\\i<\\B\\i + \\B'^h 



< K + [a 



■ a. 



First suppose R is rational. Note that k is then also rational. Let R' = 
R — D, where D = diag{Qii). Thus, for some large enough integer > 0, 
A{G) = NR' is the adjacency matrix of an undirected multigraph G = {V, E) 
with y = [n] , ND is a matrix of even integers, and Nk is an integer. Thus, 
provided B is eventually rescaled to B/N, we may assume these properties 
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hold for R\ D and k. An orientation of G is a directed multigraph G = (V, E) 
such that exactly one of e"*" = (u, u;), e~ = {w, v) is in E for every e = {f , € 
E. Clearly, = ^(G) + A{G)^ , so we may take B = A{G) + \D. Note 

that ||i?||i = maxi,gy((i~ + ^^J^) and ||i?||oo = max^gy(d+ + ^i?™). We now 
apply the following (slightly restated) theorem of Frank and Gyarfas [20] . 

Theorem 42 (Frank and Gyarfas [20]). Suppose i^, < for allvGV in 
an undirected multigraph G = {V, E). Then G has an orientation G satisfying 
Iv < d~ < Uv if and only if, for all S '^V, we have \Es\ < max{J2v£S''^v, 

We will take Uv = ^Qw, iv = dv + K. — a + ^Qw Then <Uv, since d^ < 
(a — gvv)i ^-nd {dy — i^) > since a > 2k. The conditions of Theorem 42 
are satisfied, since for all 5 C 1/, 

\^s\ = h"^"^ Qvw - Qvv 

<K\S\-l^gyy = ^Uy < ^{dy-iy). 

ves ves ves 

The result now follows for rational R, since we have 
||i?||i = max{d~ + Iqw) < 

\\B\\oo = max((i+ + hg^^) < max((i„ - £v + ^gw) = a- k. 

If R is irrational, standard continuity arguments now give the conclusion. 

□ 

Remark 10. The use of Theorem 42 in the proof can be replaced by 
an application of the max-flow min-cut theorem, as in [21], but Theorem 42 
seems more easily applicable here. 

We can show that Lemma 41 is best possible, in the following sense. 

Lemma 43. Let R S be symmetric with maximum density k and let 
a = \\R\\i. If R = B + for any B G M„, then > k and ||-B||oo ^ 

a — \\B\\i. 

Proof. Let / be any set achieving the maximum density of R. Then 
2\I\k= ^ gi, < Y.{\B,j\ + \Bj,\) 

= 2J2\B,,\ < 2J2J2\B^J\ < 2|/|||i?||i, 
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SO \\B\\i > K. The second assertion follows from (8). □ 

Theorem 44. If is a symmetric matrix with maximum density 

K and a = \\R\\i, then \{R) < 2\J K{a — k). 

Proof. Follows directly from Corollary 37 and Lemma 41. □ 

Remark 11. Since ^{a — n) is increasing for k < a/2, an upper bound 
k' can be used, as long as we ensure that k' < a/2. 

Remark 12. We can adapt this for asymmetric R by considering the 
"symmetrization" ^{R + R^). Note that k{R) = k{^{R + R^)). Let a{R) = 
+ R^)\\i < ^(||ii||i + ||-R||oo). We also have X{R) < i^{R) = v{\{R + 
i?T)) ^ ^(1 + ^T))^ r^Yien \{R) < 2y/n{a-K). 

The following application, used together with Lemma 2, strengthens [14], 
Theorem 15. 

Theorem 45. Suppose R is a symmetric and irreducible dependency 
matrix with row sums at most 1, and suppose < 7 < mm^ jfz[n]{Qij ■ 8ij > 0}- 
If there is any row with sum at most 1 — 7, then X{R) < y^l — "/'^/n^ < 
1-7^/2712. 

Proof. Since R is irreducible, for any I C [n], X^ije/ — l-^l ~ 7- This 
also holds for / = [n] by assumption. Thus, k < | — Since ||-R||i < 1, 



we have X{R) < 2^ {\ - ^)[\ + ^) = ^/l-i^/n^. The final inequality is 
easily verified. □ 

We can also apply Theorem 44 straightforwardly to (simple) graphs. 

Corollary 46. If G has maximum density k, and maximum degree A, 
then \{G) < 2^k{A - k). 

Proof. In Theorem 44, we have a = A. □ 

Theorem 47. If G = {V,E) e Q{a,b), with b>0, A > 2a and \ V\=n, 
then 



A(G)<2^/(a-^ ) (A-a + ^ 



< 



b{A - 2a) 



'a{A-a) 2-^- , i/A>2a; 

V a(A — a)n J 

a ( 2 - 4^ Y if A = 2a. 
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Proof. The first inequality follows directly from Lemma 38 and Corol- 
lary 46. Note that the condition A > 2a — 2h/n is required in view of Re- 
mark 11. For the second, squaring gives 



46(A-2a) 462 46(A - 2a) }?{l^-2af 



4a(A-a)-^ ^ _ < 4a(A - a) - ^ + 

n n'^ n a(A — a)n^ 

which holds for all h and A > 2a. When A = 2a, using < 1 - 



A(G)<2i/a2--^ = 2aWl 



^ 2a?'n'^ ( a?"n?} 

Theorem 48. If G = {V,E) e Q{a,h), with b<0 and \V\ =n, let k* = 
a + \ + J{a + \Y-2h and k* = max{([rj - l)/2,a - b/\k*]}. Then, if 



A>2k*,X{G) < ^K*{A-K*). 

Proof. This follows immediately from Lemma 38, Theorem 44 and Re- 
mark 11. □ 

We can apply this to the examples from Lemma 39. 

Corollary 49. If G = (V^E), with maximum degree A and \V\ =n, 



is: 



(i) A connected nonregular graph, th en 

\{G) < ^A2-A < A--^ 
V An^ 

(ii) A tree with A > 2, then 



X{G)<2J{1--)(a-1 + -)<^/A^(2 ^ ^ 



nj \ nj V (A — l)n 

// A = 2, then \{G) < 2 - l/n^, and i/ A < 2, then \{G) = A. 
(iii) A graph with tree-width at most t and A > 2t, then 



A(G)<2,/(^-^<^VA-t+*<'+^' 



2n / V 2n 
IfA = 2t, then \{G) <2t- t{t + if/An'^ . 
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(iv) A planar graph with A > 6, then 

X{G) < 2^ (3-6/n)(A-3 + 6/n) < 2y'3(A-3)(^l - • 
//A = 6, A(G) < 6 - 12/?i2. // A < 5, A(G) < A is &est possible. 



(v) ^ graph of genus g > 0, let kg = ^ + y 12g + | and = max{([A;g 
l)/2,3 + 6((7-l)/rA;3l}. IfA>2Kg, then 



Proof. Using Lemma 39, these follow using Theorem 47 and Theo- 
rem 48 with: 

(i) a = A/2, 6 = 1 and A = 2a; 

(ii) a = 1, 6 = 1, if A > 2. If A = 2, the result follows from the A = 2a 
case. A = 1, G is a single edge and, if A = 0, an isolated vertex; 

(iii) a = t, b = t{t + l)/2; 

(iv) a = 3, b = 6. IfA<5, regular planar graphs with degree A exist, 
and we use Lemma 35; 

(v) a = 3, b = -6{g-l). □ 

Remark 13. If G is a disconnected graph, the component having the 
largest eigenvalue determines A(G), using Lemma 8. This can be applied to 
a forest. 

Remark 14. Corollary 49(i) improves on a result of Stevanovic [37], 
who showed that 

^(^^ < ^"2n(nA-l)A2- 

This was improved by Zhang [40] to (approximately) A — ^(An)~^, which 
is still inferior to (i). But recently the bound has been improved further by 
Cioaba, Gregory and Nikiforov [6], who showed 

A(G) < A-— , 
^ ^ n(P + l)' 

where V is the diameter of G. This gives A(G) < A — l/?i^ even in the 
worst case, which significantly improves on (i). However, Corollary 49 is an 
easy consequence of the general Corollary 46, whereas [6] uses a calculation 
carefully tailored for this application. 

Remark 15. When G is a degree-bounded forest, Corollary 49(ii) strength- 
ens another result of Stevanovic [36], who showed A(G) < 2\/ A — 1. 

Remark 16. When G is a planar graph, Theorem 47(iv) improves a 
result of Hayes [22] . 
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We can now apply these results to the mixing of Glauber dynamics for 
proper colorings in the classes of sparse graphs Q{a,b). 

Theorem 50. Let G = iy-iE) G Q{a,b), with b> 0, have maximum de- 
gree A > 2a, where \V\ = n. Let ip = 2-\/a(A — a), = A — 2a and = 
i/{q-A). Then, if: 

(i) q > A + ip, the random update and systematic scan Glauber dynamics 
mix in time 

T,{e) < (l-M)"'nln(n/e), fs(e) ~ (1 - - /i)"! ln(n/e). 

(ii) q = A + ip and (j)> 0, the random update and systematic scan Glauber 
dynamics mix in time 

Trie) < (V'V2^)^^ln(n/e), fs(e) ~ (V'V2^)?^ln(n/e). 

(iii) q = A + ip and = 0, the random update and systematic scan Glauber 
dynamics mix in time 

Trie) <2{a/bfn^\n{n/e), f^{e) 3{a/b)'^n'^ln{n/e). 

Proof. Recall from the beginning of Section 4 that X{R) < X{G)/{q — 
A) where A(G) denotes X{A{G)). Note also that, if tjj is not an integer, 
then q — A — tp = $7(1). By Theorem 47, for (i) we have ||i?||2 = X{R) < 
\{G)/{q - A) < i^/{q -A) = iJL<l. For (ii), we have \{R) < 1 - (26(/./V'^n), 
and for (iii), X{R) < 1 — (6^/2a^n^). The conclusions for Tr(e) follow from 
Lemma 17, and those for Ts{e) from Lemma 3. For (ii) and (iii), factors of 
^ arise in Lemma 3 since A~ 1, but additional factors (2 and 3, resp.) come 
from the log term. □ 



Theorem 51. IfG = {V,E) e g{a,b) withb<0, let k* = a + \ + ^{a + \)'^ 

andAt* = max{([rj -l)/2,a-6/[A;*]}. If A>2n*, let iIj = ^/ n* (q - k*) and 
H = jp/{q — A). Then, if q> A + ip. 

Trie) < {l-fi)~^n\n{n/e), fs(e) ~ (1 - i/i)(l - /i)"! ln(n/e). 

Proof. From Theorem 48, we have 

\\Rh = X{R)<^<^=f.<l. 
q — A q — A 

The conclusions for Tr(e) now follow from Lemmas 14 and 17, and those for 
'fs(e) from Lemma 3. □ 



Corollary 52. If G = {V,E), with \V\ =n and maximum degree A, 



is: 



(i) a nonregular connected graph and q = 2 A, then 
1 

2 



Trie) < ^A'^n^ln{n/e), f,{e) ~ fA^n^ ln(n/e). 
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(ii) A graph with tree-width t and A > 2t, let iJj = 2y/t{A — t). Then 

(g - A)(g - A - i/;)~^nln(n/e), ifgyA + ip; 

Tr(e) < { tlj^{t{t + l){A-2t)y^n'^ln{n/£), if q = A + and A > 2t; 

8(t + l)~2n3ln(n/e), ifq = A + ^ and A = 2t. 

{q-A-\^){q-A-iP)-^ ln(n/e) , ifq>A + i,- 

^s(e) ~ <j V^(t(t + l)(A-2t))"Vln(n/e), if q = A + ip and A> 2t; 

12{t + l)-2n2 ln(n/e), ifq = A + i) and A = 2t. 



(iii) A planar graph and A > 6, let ip = 2-\/3(A — 3). Then 

{q- A){q- A-i/;)~^nln{n/e), if q > A + ijj; 
Trie) < <( V'^(12(A-6))-in2ln(n/e), if q = A + ^ and A > 6; 

^n^ln{n/e), if q = A + ip and A = 6. 

{q-A-^i>){q-A-ipyHn{n/e), ifq>A + ij- 
fs(e) ~ <( ^2(12(A-6))^inln(n/e), if q = A + ip and A> 

|n^ln(n/e), i/ g = A + ■0 «?^c? A = 6. 



(iv) A graph of genus g > 0, let kg = ^ + \/l2g + j, Kg = max{([A;, 



l)/2,3 + 6{g-l)/\kg]} and %p = ^ Kg{A - Kg) . If A>2Kg andqyA + ip, 
then 

Trie) <{q- A){q - A - 'ip)^'^nln{n/e), 
Ue) ~ (g - A - - A - V')"' Hn/e). 

Proof. This follows directly from Lemma 39 and Theorems 50 and 51. 

□ 



Remark 17. Corollary 52(i) bounds the mixing time of heat bath Glauber 
dynamics for sampling proper g-colorings of a nonregular graph G with 
maximum degree A when q = 2 A. (We can bound the mixing time for a 
disconnected graph G by considering the components.) It is also possible to 
extend the mixing time result for nonregular graphs to regular graphs using 
the decomposition method of Martin and Randall [26]. See [14], Section 5, 
for details about how to do this. The use of our Corollary 52(i) improves 
Theorem 5 of [14] by a factor of n. 
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